© John Wiley & Sons, Inc.

FIGURE 13-6: Results of two raters reading the same set of 50 specimens and rating each specimen yes or no.

Looking at Figure 13-6, cell a contains a count of how many scans were rated yes — there is a tumor

— by both Rater 1 and Rater 2. Cell b counts how many scans were rated yes by Rater 1 but no by

Rater 2. Cell c counts how many scans were rated no by Rater 1 and yes by Rater 2, and cell d shows

where Rater 1 and Rater 2 agreed and both rated the scan no. Cells a and d are considered concordant

because both raters agreed, and b and c are discordant because both raters disagreed.

Ideally, all the scans would be counted in concordant cells a or d of Figure 13-6, and

discordant cells b and c would contain zeros. A measure of how close the data come to this ideal

is called Cohen’s Kappa, and is signified by the Greek lowercase kappa: κ. You calculate kappa

as:

.

For the data in Figure 13-6,

, which is 0.5138.

How is this interpreted?

If the raters are in perfect agreement, then κ = 1. If you generate completely random ratings, you will

see a κ = 0. You may think this means κ takes on a positive value between 0 and 1, but random

sampling fluctuations can actually cause κ to be negative. This situation can be compared to a student

taking a true/false test where the number of wrong answers is subtracted from the number of right

answers as a penalty for guessing. When calculating κ, getting a score less than zero indicates the

interesting combination of being both incorrect and unfortunate, and is penalized!

So, how do you interpret a κ of 0.5138? There’s no universal agreement as to an acceptable value for

. One common convention is that values of κ less than 0.4 are considered poor, those between 0.4 and

0.75 are acceptable, and those more than 0.75 are excellent. In this case, our raters may be performing

acceptably.

For CIs forκ, you won’t find an easy formula, but the fourfold table web page

(https://statpages.info/ctab2x2.html) provides approximate CIs. For the preceding example,

the 95 percent CI is 0.202 to 0.735. This means that for your two raters, their agreement was 0.514 (95

percent CI 0.202 to 0.735), which suggests that the agreement level was acceptable.